Addressee detection for dialog systems using temporal and spectral dimensions of speaking style

نویسندگان

Elizabeth Shriberg

Andreas Stolcke

Suman V. Ravuri

چکیده

As dialog systems evolve to handle unconstrained input and for use in open environments, addressee detection (detecting speech to the system versus to other people) becomes an increasingly important challenge. We study a corpus in which speakers talk both to a system and to each other, and model two dimensions of speaking style that talkers modify when changing addressee: speech rhythm and vocal effort. For each dimension we design features that do not require speech recognition output, session normalization, speaker normalization, or dialog context. Detection experiments show that rhythm and effort features are complementary, outperform lexical models based on recognized words, and reduce error rates even if word recognition is error-free. Simulated online processing experiments show that all features need only the first couple seconds of speech. Finally, we find that temporal and spectral stylistic models can be trained on outside corpora, such as ATIS and ICSI meetings, with reasonable generalization to the target task, thus showing promise for domain-independent computerversus-human addressee detectors.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mon.O2b.04 Learning When to Listen: Detecting System-Addressed Speech in Human-Human-Computer Dialog

New challenges arise for addressee detection when multiple people interact jointly with a spoken dialog system using unconstrained natural language. We study the problem of discriminating computer-directed from human-directed speech in a new corpus of human-human-computer (H-H-C) dialog, using lexical and prosodic features. The prosodic features use no word, context, or speaker information. Res...

متن کامل

Learning When to Listen: Detecting System-Addressed Speech in Human-Human-Computer Dialog

متن کامل

Learning Models for Speaker, Addressee and Overlap Detection from Multimodal Streams

A key challenge in developing conversational systems is fusing streams of information provided by different sensors to make inferences about the behaviors and goals of people. Such systems can leverage visual and audio information collected through cameras and microphone arrays, including the location of various people, their focus of attention, body pose, the sound source direction, prosody, a...

متن کامل

Neural network models for lexical addressee detection

Addressee detection for dialog systems aims to detect which utterances are directed at the system, as opposed to someone else. An important means for classification is the lexical content of the utterance, and N-gram models have been shown to be effective for this task. In this paper we investigate whether neural networks can enhance lexical addressee detection, using data from a human-human-co...

متن کامل

Prosodic Entrainment in an Information-Driven Dialog System

This paper explores entrainment of two speaking styles, shouting and hyperarticulation, in an information-driven spoken dialog system. Both styles present difficulties for automatic speech recognition. We describe and evaluate the system’s detection and reaction mechanisms for these speaking styles, which involve deploying appropriate dialog-level strategies. The three strategies tested do indu...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Addressee detection for dialog systems using temporal and spectral dimensions of speaking style

نویسندگان

چکیده

منابع مشابه

Mon.O2b.04 Learning When to Listen: Detecting System-Addressed Speech in Human-Human-Computer Dialog

Learning When to Listen: Detecting System-Addressed Speech in Human-Human-Computer Dialog

Learning Models for Speaker, Addressee and Overlap Detection from Multimodal Streams

Neural network models for lexical addressee detection

Prosodic Entrainment in an Information-Driven Dialog System

عنوان ژورنال:

اشتراک گذاری